This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files zipped via 7-zip using bzip2 compression. Each site archive includes Posts, Users, Votes, Comments, PostHistory and PostLinks. For complete schema information, see the included readme.txt.
All user content contributed to the Stack Exchange network is cc-by-sa 3.0 licensed, intended to be shared and remixed. We even provide all our data as a convenient data dump.
But our cc-by-sa 3.0 licensing, while intentionally permissive, does require attribution:
Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Specifically the attribution requirements are as follows:
Visually display or otherwise indicate the source of the content as coming from the Stack Exchange Network. This requirement is satisfied with a discreet text blurb, or some other unobtrusive but clear visual indication.
Ensure that any Internet use of the content includes a hyperlink directly to the original question on the source site on the Network (e.g., http://stackoverflow.com/questions/12345)
Visually display or otherwise clearly indicate the author names for every question and answer used
I can not use torrent (I am in Cuba)
The only option for us is to download file with wget.
However, when we try to download full stack exchange data dump we have a message:
"total size of requested files (44 GB) is too large for zip-on-the-fly"
Can I ask please to remove this limitation?
July 9, 2017 Subject:
This dump was used to generate offline static version of stackoverflow websites as part of the kiwix.org project.
Also note: Info about sites and their corresponding file names can be indirectly obtained via the /sites API query (http://api.stackexchange.com/docs/sites): For every site, the file name will match either the primary URL of that site *or* the URL of one of that site's aliases. There are no exceptions beyond SO (whose dump is uniquely split among multiple files).
Sites that are very new relative to the archive date are not included in the dump.
March 24, 2017 Subject:
June 28, 2016 Subject:
Torrent out of date
Looks like the torrent is out of date again, I'm stuck at 800MB whilst it continually tries to download files.
March 18, 2016 Subject:
.torrent is fixed
The former limitation of 25 gigabytes for a torrent have been relaxed for this item, and the torrent is again working.
If you want to parse these large XML files, you need to use a streaming parser. I'm not surprised that an XML editor would have problems with a 40 gigabyte XML file! Most XML parsing software libraries have a streaming option.
March 5, 2016 Subject:
How to open posts.xml file?
I have downloaded and extracted posts.xml file of stackoverflow. Size of the file is around 40 GB and I am not able to open it in xml editors. Can someone please suggest how to open or parse this huge file?
January 11, 2016 Subject:
What sense do we make of the files?
I see over 300 files, whereas there are only 6 XML files (large ones) as far as I remember.
I cannot download the torrent version either - can someone help make sense of the files?
December 30, 2015 Subject:
torrent out of date
It stopped at around 70% for a couple of days and could never move forward.
May 22, 2015 Subject:
Thanks for sharing
Thanks for sharing the community data. It will greatly benefit research groups.
May 3, 2015 Subject:
Great data set. Thank you for sharing.
I see only March data. How can I get April data?
What about January and February?
April 10, 2015 Subject:
I also tried couple of times. It was failed at the same point.
But then I tried when I logged in, and I was able to download the whole file :-)
April 3, 2015 Subject:
stuck at 70.7% download complete via utorrent- arg!
this should be made avail via ftp!!!
March 27, 2015 Subject:
File is broken
At the top right corner of this page there is a link to zip archive. I've downloaded it twice (on different machines, in different countries). The file was always broken.
Torrent stucks on 70.8%.
Can anyone help to get this file?
March 25, 2015 Subject:
It stopped at around 70%.
March 24, 2015 Subject:
Seeds for 3/16/15 version?
Everybody's stuck at 70.7%
February 7, 2015 Subject:
Thanks and tests
Thanks for the September update, eager to see the next one. Did someone try importing this data into a StackExchange instance?
Fun to see how small the whole SE network is after all, only few GB compressed. Wikimedia projects dumps compress very well too, but they're still much bigger (while fitting a common hard disk anyway!).
August 19, 2014 Subject:
When will be latest dump from stackoverflow will be posted over here.
July 26, 2014 Subject: